Voice activated device

ABSTRACT

A voice activated camera is described which allows users to take remote photographs by speaking one or more keywords. In a preferred embodiment, a speech processing unit is provided which is arranged to detect extended periodic signals from a microphone of the camera. A control unit is also provided to control the taking of a photograph when such an extended periodic component is detected by the speech processing unit.

The present invention relates to an apparatus and method for controllinga remote device by voice. The invention can be used, for example, tocontrol the remote taking of photographs or movies by a camera,camcorder or other image capture device.

To take a photograph with a camera, a user usually holds the camera,looks through a viewfinder situated on the back of the camera to framethe scene and then presses a button to cause a shutter in the camera torelease, thereby exposing light onto photographic film or a lightcapturing electronic device. However, in situations wherein the userdesires to be included in the photograph, e.g. group photographs,portrait photographs etc., the user will typically be some distanceremote from the camera and cannot take a photograph in the usual manner.The term “remote photograph” will be used herein to describe thesituation where the user desires to take a photograph without makingphysical contact with the camera In this situation, the user must have away to activate the shutter without having to manually press a button onthe camera.

It is known in the art to provide a timer on the camera in order toallow a remote photograph to be taken. With this camera, the userindicates that a remote photograph is to be taken by activating aspecific switch on the camera and after a predetermined period of time aphotograph is taken. However, this approach is inflexible andunsatisfactory for many scenarios since, if the timer period is tooshort, the photograph is taken before the user is ready, and if thetimer period is too long, the user is left waiting for the camera totake the photograph. Further, if more than one remote photograph is tobe taken, then the user must return to the camera each time in order toreset the timer period and then return to the desired location beforethe next photograph is taken.

It is also known in the art to provide a camera with a remote control.In this case, a remote photograph can be taken without reliance on atimer. However, the presence of the remote control adds to the overallcost of the camera. Further, the remote control is inconvenient sincethe user must carry it in addition to the camera in order to take remotephotographs.

It is also known in the art to provide a camera with speech activatedremote photograph taking, in which the camera is programmed to detect aspecific spoken keyword using an automatic speech recognition unit. Suchcameras have the advantage of not requiring a remote control whilststill allowing a photograph to be taken when the user is ready. However,these cameras may be unsuitable in fairly noisy situations where thekeyword may not be detected due to corruption of the user's speechsignal due to background noise and attenuation. Further, automaticspeech recognition is computationally expensive and requires memory forstoring the word models

It is an aim of the present invention to provide an alternativetechnique of processing input speech signals to detect speech forcontrolling a device.

It is a further aim of the present invention to provide an alternativetechnique of allowing a user to activate a remote photograph function onan image capture device.

According to one aspect, the present invention provides an apparatus forcontrolling a device, the apparatus comprising: means for dividing areceived speech signal into a sequence of speech frames; means forprocessing each speech frame to determine a measure of the periodicityof the speech within the speech frame; means for processing theperiodicity measures from a plurality of successive speech frames todetect an extended periodic portion of speech and means for controllingthe device in dependence upon the detection of said extended periodicportion.

According to another aspect, the present invention provides an imagecapture device comprising: means for receiving a speech signal; meansadapted to detect an extended periodic portion within the receivedspeech signal; and means for controlling the image capture device tocapture an image in dependence upon a detection made by said detectingmeans.

According to another aspect, the present invention provides a method ofcontrolling a device, the method comprising: receiving a speech signal;dividing the speech signal into a sequence of speech frames, each speechframe representing a time portion of the speech signal; processing eachspeech frame to determine a measure of periodicity of the portion of thespeech signal represented by the speech frame; detecting an extendedperiodic portion within the received speech signal using the periodicitymeasures from a plurality of successive speech frames; and controllingsaid device in dependence upon a detection made by said detecting step.

Exemplary embodiments of the present invention will now be describedwith reference to the accompanying drawings in which:

FIG. 1 schematically illustrates a group of people having theirphotograph taken, a member of the group saying the word “cheeeese” and acamera located remote from the group taking the photograph;

FIG. 2 is a schematic block diagram illustrating the main components ofthe camera shown in FIG. 1;

FIG. 3 is a schematic block diagram illustrating the main components ofa speech processing unit shown in FIG. 2;

FIG. 4 is a plot of a typical speech waveform generated by a microphoneof the camera, which illustrates the way in which the speech signal isdivided into a number of non-overlapping frames;

FIG. 5 is a schematic block diagram illustrating the main components ofa frame periodicity determining unit shown in FIG. 3;

FIG. 6 is a plot illustrating an output of an auto-correlation unitshown in FIG. 5 for a frame of speech;

FIG. 7 is a schematic block diagram illustrating the main components ofan extended periodicity determining unit shown in FIG. 3;

FIG. 8 is a schematic block diagram illustrating the main components ofan alternative speech processing unit to the one shown in FIG. 2.

OVERVIEW

FIG. 1 shows a group of five people 1-1 to 1-5 who are posing for aphotograph. When the group is ready for their photograph to be taken,one of them 1-1 says the word cheese with an extended vowel portion“eeee”. In this embodiment, the camera 3 is operable to detect thisextended vowel portion and, if detected, to take a photograph.

FIG. 2 is a schematic block diagram which shows the main components ofthe camera 3. In normal use, a user controls the camera 3 via a userinput device 31 (such a button or dial). This user input is passed to acamera control unit 33 which controls the camera, for example to controla shutter 35 which allows light onto photographic film or a lightsensitive electronic component such as a CCD or CMOS sensor (not shown).The camera control unit 33 also controls a user output device 37 (suchas an LCD display or LED lights) in order to indicate camera andphotograph status information (such as camera power, light intensity,flash mode etc.) to the user. As shown in FIG. 2, the camera 3 alsoincludes a microphone 39 for converting a user's speech intocorresponding electrical speech signals; and a speech processing unit 41which processes the electrical speech signals to detect the presence ofa keyword in the user's speech and which informs the camera control unit33 accordingly.

SPEECH PROCESSING UNIT

As discussed above, the speech processing unit 41 is arranged to detectkeywords spoken by the user in order to control the taking of remotephotographs. In this embodiment, the speech processing unit does notemploy a “conventional” automatic speech recognition type keywordspotter which compares the spoken speech with stored models to identifythe presence of one of the keywords. Instead, the speech processing unit41 used in this embodiment is arranged to detect a sustained periodicsignal within the input speech, such as would occur if the user gays theword “cheeeese” or some other similar word. The inventor has found thatbecause of the strong periodic nature of such a sustained vowel sound,the speech processing unit 41 can still detect the sound even at verylow signal-to-noise ratios.

The way in which the speech processing unit 41 operates in thisembodiment will now be explained with reference to FIGS. 3 to 7.

FIG. 3 illustrates the main functional blocks of the speech processingunit 41 used in this embodiment. The input signal (S(t)) received fromthe microphone 39 is sampled (at a rate of just over 11 KHz) anddigitised by an analogue-to-digital (A/D) converter 101. Although notshown, the speech processing unit 41 will also include an anti-aliasingfilter before the A/D converter 101, to prevent aliasing effectsoccurring due to the sampling. The sampled signal is then filtered by abandpass filter 103 which removes unwanted frequency components. Sincevoiced sounds (as opposed to fricative sounds) are generated by thevibration of the user's vocal cords, the smallest fundamental frequency(pitch) of the periodic signal to be detected will be approximately 100Hertz Therefore, in this embodiment, the bandpass filter 103 is arrangedto remove frequency components below 100 Hertz which will not contributeto the desired periodic signal. Also, the bandpass filter 103 isarranged to remove frequencies above 500 Hertz which reduces broadbandnoise from the signal and therefore improves the signal-to-noise ratio.The input speech is then divided into non-overlapping equal lengthframes of speech samples by a framing unit 105. In particular, in thisembodiment the framing unit 105 extracts a frame of speech samples every23 milliseconds. With the sampling rate used in this embodiment, thisresults in each frame having 256 speech samples. FIG. 4 illustrates thesampled speech signal (S(n), shown as a continuous signal for ease ofillustration) and the way that the speech signal is divided intonon-overlapping frames.

As shown in FIG. 3, each frame f₁ of speech samples is then processed bya frame periodicity determining unit 107 which processes the speechsamples within the frame to calculate a measure (v_(i)) of the degree ofperiodicity of the speech within the frame. A high degree of periodicitywithin a frame is indicative of a voiced sound when the vocal cords arevibrating. A low degree of periodicity is indicative of noise orfricative sounds. The calculated periodicity measure (v_(i)) is thenstored in a first-in-first-out buffer 109. In this embodiment, thebuffer 109 can store frame periodicity measures for forty-fourconsecutive frames, corresponding to just over one second of speech.Each time a new frame periodicity measure is added to the buffer 109, anextended periodicity determining unit 111 processes all of theforty-four periodicity measures in the buffer 109 to determine whetheror not a sustained periodic sound is present within the detection windowrepresented by the forty-four frames.

When the extended periodicity determining unit 111 detects a sustainedperiodic sound within the speech signal, it passes a signal to thecamera control unit 33 confirming the detection. As discussed above, thecamera control unit 33 then controls the operation of the camera 3 totake the photograph at the appropriate time.

FRAME PERIODICITY DETERMINING UNIT

As those skilled in the art will appreciate, various techniques can beused to determine a measure of the periodicity of the speech within eachspeech frame. However, the main components of the particular frameperiodicity determining unit 107 used in this embodiment is shown inFIG. 5. As shown, the frame periodicity determining unit 107 includes anauto-correlation determining unit 1071 which receives the current speechframe f_(i) from the framing unit 105 and which determines theauto-correlation of the speech samples within the frame. In particular,the auto-correlation determining unit 1071 calculates the followingfunction: $\begin{matrix}{{A(L)} = {\frac{1}{N - L}{\sum\limits_{j = 0}^{N - L - t}{{x(j)}{x( {j + L} )}}}}} & (1)\end{matrix}$where x(j) is the j^(th) sample within the current frame, N is thenumber of samples in the frame, j=0 to N-1 and L=0 to N-1.

The value of A(L) for L=0 is equal to the signal energy and for L>0 itcorresponds to shifting the signal by L samples and correlating it withthe original signal. A periodic signal shows strong peaks in theauto-correlation function for values of L that are multiples of thepitch period. In contrast, non-periodic signals do not have strongpeaks.

FIG. 6 shows the auto-correlation function (A_(i)(L)) for a frame ofspeech fi representing a speech signal which is periodic and whichrepeats approximately every 90 samples. As shown in FIG. 6, theauto-correlation around L=180. Further, the value of theauto-correlation function at L=90 is approximately the same as the valueat L=0, indicating that the signal is strongly periodic.

The fundamental frequency or pitch of voiced speech signals variesbetween 100 and 300 Hertz. Therefore, a peak in the auto-correlationfunction is expected between L_(low)=F_(s)/300 and L_(high)=F_(s)/100,where F_(s) is the sampling frequency of the input speech signal.Consequently, in this embodiment, the auto-correlation function outputby the auto-correlation determining unit 1071 is input to a peakdetermining unit 1073 which processes the auto-correlation valuesbetween A(L_(LOW)) and A(L_(HIGH)) to identify the peak value(A(L_(MAX))) within this range. In this embodiment, with a sampling rateof just over 11 kHz the value of L_(LOW) is 37 and the value of L_(HIGH)is 111. This search range of the peak determining unit 1073 isillustrated in FIG. 6 by the vertical dashed lines, which also shows thepeak occurring at L_(MAX)=90. The auto-correlation values A(0) andA(L_(MAX)) are then passed from the peak determining unit 1073 to aperiodicity measuring unit 1075 which is arranged to generate anormalised frame periodicity measure for the current frame (f_(i)) bycalculating: $\begin{matrix}{v_{i} = \frac{A_{i}( L_{MAX} )}{A_{i}(0)}} & (2)\end{matrix}$where v_(i) will be approximately one for a periodic signal and close tozero for a non-periodic signal.

EXTENDED PERIODICITY DETERMINING UNIT

The operation of the extended periodicity determining unit 111 will nowbe described in more detail with reference to FIG. 7, which shows ablock diagram of the main components of the extended periodicitydetermining unit 111 used in this embodiment. As discussed above, thepurpose of the extended periodicity determining unit 111 is to processthe periodicity measures stored in the buffer 109 to detect a sustainedvoiced (i.e. periodic) signal with a minimum duration of one second. Itdoes this by checking the variability of the periodicity measurescurrently stored within the buffer 109. In this embodiment, the extendedperiodicity determining unit 111 generates two different measures ofthis variability. In particular, in this embodiment, the extendedperiodicity determining unit 111 includes a first periodicity measureprocessing unit 111 which is operable to calculate the followingvariability measure from the periodicity measures (v_(i)) stored in thebuffer 109: $\begin{matrix}{m = {{1 - {\frac{1}{W}{\sum\limits_{j = 1}^{W}v_{j}}}}}} & (3)\end{matrix}$where w is the number of periodicity measures in the buffer 109 (and isthe value forty-four in this embodiment). The value of m shouldtherefore be close to zero for a sustained periodic signal and should beclose to one for a non-periodic signal.

The extended periodicity determining unit 111 also includes a secondperiodicity measure processing unit 1113 which also processes theperiodicity measures stored is in the buffer 109 to generate thefollowing second variability measure: $\begin{matrix}{s = {\sum\limits_{j = 1}^{W}{{v_{j} - {\frac{1}{W}{\sum\limits_{j = 1}^{W}v_{j}}}}}}} & (4)\end{matrix}$

The value of a should also be small (close to zero) for a sustainedperiodic signal and should be larger for signals containing bothperiodic and a periodic portions As shown in FIG. 7, the abovevariability measures are then output by the respective processing unitsto a combination unit 1115 which, in this embodiment, linearly combinesthe two variability measures as follows:Am+s   (5)where A is an appropriate scale factor. The combined variability measuredefined by equation (5) above is then compared with a predeterminedthreshold value (Th) in a threshold unit 1117 and, based on thecomparison result, a decision is made by a decision unit 1119 as towhether or not the speech in the current detection window corresponds toan extended periodic signal In particular, if the speech signal doescorrespond to an extended periodic signal, then the value of Am+s shouldbe less than the threshold, whereas if it does not, then Am+s should begreater than the threshold. As those skilled in the art will appreciate,suitable values for the scale factor A and the threshold Th can bedetermined empirically in advance using recordings of extended periodicsignals and environmental noise.

Therefore, as those skilled in the art will appreciate, each time a newperiodicity measure for the next speech frame is input into the buffer109, the extended periodicity determining unit 111 determines whether ornot the speech in the new detection window corresponds to an extendedperiodic signal. However, if the peak value of Am+s happens to be closeto the threshold value, then several detections may be triggered in ashort time. In order to avoid this problem, the decision unit 1119 doesnot output a new decision indicating an extended periodic portion forone second after the previous such decision was made.

Summary and Advantages

As those skilled in the art will appreciate, a system has been describedabove which allows the user to take remote photographs using only theirvoice. Rather than use an automatic speech recognition system within thecamera, the camera includes a speech processing system which is arrangedto detect a sustained periodic signal within the user's speech and, whendetected, causes the camera to take the photograph. The system thereforehas the following advantages over existing camera systems:

-   -   (1) the camera shutter is able to be operated when the user is        ready for the photograph to be taken in contrast to a timer        system where the user must wait for the timer.    -   (2) The camera shutter can be operated any number of times in        contrast to a timer system where the user must reset the timer        for each photograph.    -   (3) The camera shutter can be operated without the need for an        additional remote control device that is inconvenient and adds        to the cost of the camera.    -   (4) Periodic signals can be detected even when contaminated with        high levels of noise. This is important because the user might        be several metres from the microphone of the camera and the        signal-to-noise ratio may therefore be small. Current automatic        speech recognition systems perform poorly at such low levels of        signal-to-noise ratio.    -   (5) The system does not require localisation for each language        since the user can be instructed to utter any word that contains        a sustained vowel sound. In contrast, existing automatic speech        recognition techniques would require speech data to be collected        for each language and would significantly increase the cost of        the camera.    -   (6) The processing of the acoustic signal is relatively simple        and does not therefore have very high computational and memory        resource requirements compared with those of a full automatic        speech recognition system.        Alternatives and Modifications

An embodiment has been described above of a voice-activated camera whichallows a user to take remote photographs by speaking a keyword. Atechnique has been described for detecting the keyword by detectingextended periodic sounds uttered by the user. As those skilled in theart will appreciate, the particular keyword detection algorithmdescribed above can be used to control other devices as well as cameras.

In the above embodiments, the speech processing unit was arranged todivide the speech signal into a number of successive frames and tocalculate a measure of the periodicity of the speech within each frame.This periodicity measure was deter-mined by performing anauto-correlation of the speech signal within the frame. As those skilledin the art will appreciate, because of the equivalent relationshipbetween correlation in the time domain and multiplication in thefrequency domain, a similar periodicity measure can be determined bytaking the Fourier transform of the speech samples within each frame,squaring the transform and then looking for the periodic peaks withinthe squared transform. The way in which such an alternative embodimentwould operate will be apparent to those skilled in the art and will notbe described further.

In the above embodiment, a filter was used after the analogue-to-digitalconverter in order to remove higher frequency speech components tothereby improve the signal-to-noise ratio. In an alternative embodiment,the bandpass filter may be replaced with a Weiner filter which is tunedto the spectral shape of an extended vowel sound. Such a filter willtherefore improve the signal-to-noise ratio for such extended vowelsounds, making the algorithm more sensitive to extended vowels and lesssensitive to other sounds. The way in which such Weiner filters workwill be apparent to those skilled in the art in signal processing andwill not, therefore, be described further.

In the above embodiment, the speech processing unit can detect periodicsignals having a fundamental frequency greater than 300 Hertz. In orderto mitigate this problem, the speech processing unit may also include apitch detection circuit which processes the speech signal in order toestimate the pitch of the user's signal. The speech processing unit canthen use this estimated pitch to reject sounds outside of the 100 to 300Hz range. Additionally, if such a pitch detector is provided, then theextended periodicity determining unit may also check to ensure that theestimated pitch also does not change greatly during the currentdetection window (corresponding to the current set of periodicitymeasures within the buffer). This may help to reject musical soundswhich also include periodic components but whose fundamental frequencychanges over time.

In the above embodiments, the speech processing unit did not use anyspectral shape information to detect the extended vowel sound uttered bythe user. FIG. 8 illustrates an embodiment where the speech detectionunit uses such spectral shape information to detect the extended vowelsound. The same reference numbers have been given to components thathave equivalent functionality as in the first embodiment and these willnot be described again. As shown, in this embodiment, the speechprocessing unit includes a spectral shape determining unit 113 whichprocesses the speech samples within each frame to determine a vectorrepresenting the spectral shape of the speech within the frame. Anystandard spectral representation may be used such as the cepstral or LPCspectral representations.

The spectral parameters for the current frame generated by the spectralshape determining unit 113 are then input to the first-in-first-outbuffer 109. The sets of spectral parameters stored in the buffer 109 arethen compared by the comparison unit 115 with a spectral voicing model117 for a vowel sound. The result of the comparison is then processed bythe extended periodicity determining unit 111, again to determine if anextended periodic sound is present within the input speech. Further, theextended periodicity determining unit may be arranged to control thedetection by checking that the spectral shape parameters for the framesin the buffer 109 do not change by more than predetermined amount. Inthe embodiment described above, a periodicity measure was calculated foreach frame of speech using an auto-correlation calculation. As thoseskilled in the art will appreciate, other periodicity measures can beused. For example, the Average Magnitude Difference (AMDF) method couldbe used which calculates the following function: $\begin{matrix}{\frac{1}{N}{\sum\limits_{0}^{n}{{x_{j} - x_{j - n}}}}} & (6)\end{matrix}$

As those skilled in the art will appreciate, this function is fasterthan auto-correlation to implement in integer arithmetic since it doesnot involve any multiplication.

In the embodiment described above, the extended periodicity measuredetermining unit calculated two measures that represented the variationof the periodicity measures within the detection window (i.e. for theframes within the buffer). As those skilled in the art will appreciate,it is not essential to use both of these variation measures. Further, itis not essential to use the particular variation measures that arecalculated. For example, the variation measures given in equations (3)and (4) may be modified to replace the absolute operations with a squareoperation. Further, in an alternative embodiment only one of thevariation measures may be used. Additionally, where more than onevariation measure is used, it is not essential to combine these in alinear manner. Instead, the variation measures may be combined in anon-linear way by, for example, a neural network.

In the embodiment described above, the frame periodicity determiningunit calculated an auto-correlation function for the speech sampleswithin each frame in the current detection window. However, since thepeak determining unit searches for a peak within the auto-correlationfunction between A(L_(LOW)) and A(L_(high)) it is only necessary for theauto-correlation determining unit to calculate the auto-correlationfunction between these values and to calculate A(0).

In the embodiment described above, the frame periodicity determiningunit determined a measure of the periodicity of the speech within eachframe by using an auto-correlation calculation. Alternatively, thecamera may be pre-trained by the user and may store a frame of speechcorresponding to a voiced portion uttered by the user. In this case, theframe periodicity determining unit may perform a cross correlationbetween each frame in the received speech and the stored frame from thetraining speech. Although such an embodiment requires a user trainingroutine, it offers the advantage that the camera will only detectperiodic signals whose fundamental frequency matches the pitch of thetraining speech. The system will therefore be more robust to similarperiodic signals coming from different users or from different sources.

In the above embodiment, the periodicity measures for the frames in thecurrent detection window were combined and then the combined measure wascompared with a predetermined threshold. Alternatively, each periodicitymeasure may be compared with the threshold and a running score kept ofthe number of periodicity measures in the current detection window whichare greater than or less than the threshold value. The extendedperiodicity determining unit can then determine if there is an extendedperiodic speech signal from this running total.

In the above embodiment, a camera has been described having a number ofprocessing modules. As those skilled in the art will appreciate, theseprocessing modules may be implemented by dedicated hardware circuits orthey may be implemented using a general purpose processor controlled bysoftware instructions. The software instructions may be programmed inadvance into the camera or they may be purchased later and, for example,downloaded from a website into the camera.

In the embodiment described above, the extended periodicity determiningunit processed the periodicity measures stored in the buffer to detect asustained voiced signal with a minimum duration of one second. As thoseskilled in the art will appreciate, the minimum duration does not haveto be one second to detect a sustained voiced signal. However, if theminimum duration is too short, then this may result in too manydetections and if the minimum duration is too long, then this may beinconvenient for the user. In a preferred embodiment, therefore, theminimum duration is preferably set between half a second and one and ahalf seconds.

1. An apparatus for controlling a device, the apparatus comprising: areceiver operable to receive a speech signal; a divider operable todivide the speech signal-into a sequence of speech frames, each speechframe representing a time portion of the speech signal; a processoroperable to process each speech frame to determine a measure ofperiodicity of the portion of the speech signal represented by thespeech frame; a detector operable to detect an extended periodic portionwithin the received speech signal using the periodicity measures from aplurality of successive speech frames; and a controller operable tocontrol said device in dependence upon a detection made by saiddetector.
 2. An apparatus according to claim 1, wherein said detectorcomprises a combiner operable to combine the periodicity measures fromsaid plurality of successive speech frames.
 3. An apparatus according toclaim 2, wherein said detector is operable to combine said plurality ofperiodicity measures to calculate a measure of the variability of theperiodicity measures for said successive speech frames.
 4. An apparatusaccording to claim 2, wherein said detector is operable to combine saidplurality of periodicity measures to calculate a plurality of measuresof the variability of the periodicity measures for said successivespeech frames and is operable to combine the plurality of variabilitymeasures and to detect said extended periodic portion from the combinedvariability measure.
 5. An apparatus according to claim 4, wherein saiddetector is operable to linearly combine said plurality of variabilitymeasures.
 6. An apparatus according to claim 4, wherein said detector isoperable to combine said plurality of variability measures in anon-linear manner.
 7. An apparatus according to claim 2, wherein saiddetector is operable to compare the combined periodicity measures with athreshold value.
 8. An apparatus according to claim 1, wherein saidprocessor is operable to determine an auto-correlation function for eachspeech frame.
 9. An apparatus according to claim 8, wherein saidprocessor comprises a peak determiner operable to locate a maximumauto-correlation value within a predetermined part of saidauto-correlation function.
 10. An apparatus according to claim 9,wherein said predetermined part corresponds to an input frequency of thespeech signal in the range of 100 Hertz to 300 Hertz.
 11. An apparatusaccording to claim 9, wherein said processor is operable to calculate:$v_{i} = \frac{A_{i}( L_{MAX} )}{A_{i}(0)}$ where A(0) is thevalue of the auto-correlation function for a zero shift and A(L_(max))is said maximum auto-correlation value determined by said peakdeterminer.
 12. An apparatus according to claim 1, further comprising ananalogue to digital converter operable to sample the received speechsignal.
 13. An apparatus according to claim 12, wherein said divider isoperable to divide said speech signal so that each frame comprises thesame number of samples.
 14. An apparatus according to claim 1, furthercomprising a microphone and wherein said receiver is operable to receivesaid speech signal from said microphone.
 15. An apparatus according toclaim 1, wherein said device is an image capture device and wherein saidcontroller is operable to activate the image capture device to capturean image.
 16. An apparatus according to claim 1, wherein said detectoris operable to process the periodicity measures from successive speechframes corresponding to a predetermined duration of speech.
 17. Anapparatus according to claim 16, wherein said detector is operable toprocess the periodicity measures from successive speech framescorresponding to a portion of speech having a duration between half asecond and one and a half seconds.
 18. An apparatus for controlling adevice, the apparatus comprising: a receiver operable to receive aspeech signal; a detector operable to detect an extended periodicportion within the received speech signal; and a controller operable tocontrol said device in dependence upon a detection made by saiddetector.
 19. An image capture device comprising: a receiver operable toreceive a speech signal; a detector operable to detect an extendedperiodic portion within the received speech signal; and a controlleroperable to control the image capture device to capture an image independence upon a detection made by said detector.
 20. A method ofcontrolling a device, the method comprising: receiving a speech signal;dividing the speech signal into a sequence of speech frames, each speechframe representing a time portion of the speech signal; processing eachspeech frame to determine a measure of periodicity of the portion of thespeech signal represented by the speech frame; detecting an extendedperiodic portion within the received speech signal using the periodicitymeasures from a plurality of successive speech frames; and controllingsaid device in dependence upon a detection made by said detecting step.21. A method according to claim 20, wherein said detecting step combinesthe periodicity measures from said plurality of successive speechframes.
 22. A method according to claim 21, wherein said detecting stepcombines said plurality of periodicity measures to calculate a measureof the variability of the periodicity measures for said successivespeech frames.
 23. A method according to claim 21 wherein said detectingstep: combines said plurality of periodicity measures to calculate aplurality of measures of the variability of the periodicity measures forsaid successive speech frames; and combines the plurality of variabilitymeasures and detects said extended periodic portion from the combinedvariability measure.
 24. A method according to claim 23, wherein saiddetecting step linearly combines said plurality of variability measures.25. A method according to claim 23, wherein said detecting step combinessaid plurality of variability measures in a non-linear manner.
 26. Amethod according to claim 21, wherein said detecting step compares thecombined periodicity measures with a threshold value.
 27. An methodaccording to claim 20, wherein said processing step determines anauto-correlation function for each speech frame.
 28. A method accordingto claim 27, wherein said processing step comprises a peak determiningstep to locate a maximum auto-correlation value within a predeterminedpart of said auto-correlation function.
 29. A method according to claim28, wherein said predetermined part corresponds to an input frequency ofthe speech signal in the range of 100 Hertz to 300 Hertz.
 30. A methodaccording to claim 28, wherein said processing step calculates:$v_{i} = \frac{A_{i}( L_{MAX} )}{A_{i}(0)}$ where A(0) is thevalue of the auto-correlation function for a zero shift and A(L_(max))is said maximum auto-correlation value determined by said peakdetermining step.
 31. A method according to claim 20, further comprisingthe step of sampling the received speech signal.
 32. A method accordingto claim 31, wherein said dividing step divides said speech signal sothat each frame comprises the same number of samples.
 33. A methodaccording to claim 20, wherein said receiving step receives said speechsignal from a microphone.
 34. A method according to claim 20, whereinsaid device is an image capture device and wherein said controlling stepactivates the image capture device to capture an image.
 35. A methodaccording to claim 20, wherein said detecting step uses the periodicitymeasures from a plurality of successive speech portions corresponding toa predetermined duration of speech.
 36. A method according to claim 35,wherein said detecting step is operable to use the periodicity measuresfrom successive speech frames corresponding to a portion of speechhaving a duration of between half a second and one and a half seconds.37. A computer readable medium storing computer executable instructionsfor causing a programmable computer device to perform a controllingmethod, the computer executable instructions comprising instructionsfor: receiving a speech signal; dividing the speech signal into asequence of speech frames, each speech frame representing a time portionof the speech signal; processing each speech frame to determine ameasure of periodicity of the portion of the speech signal representedby the speech frame; detecting an extended periodic portion within thereceived speech signal using the periodicity measures from a pluralityof successive speech frames; and controlling said device in dependenceupon a detection made by said detecting step.
 38. Computer executableinstructions for controlling a programmable computer device to perform acontrolling method, the computer executable instructions comprisinginstructions for: receiving a speech signal; dividing the speech signalinto a sequence of speech frames, each speech frame representing a timeportion of the speech signal; processing each speech frame to determinea measure of periodicity of the portion of the speech signal representedby the speech frame; detecting an extended periodic portion within thereceived speech signal using the periodicity measures from a pluralityof successive speech frames; and controlling said device in dependenceupon a detection made by said detecting step.
 39. An apparatus forcontrolling a device, the apparatus comprising: means for receiving aspeech signal; means for dividing the speech signal into a sequence ofspeech frames, each speech frame representing a time portion of thespeech signal; means for processing each speech frame to determine ameasure of periodicity of the portion of the speech signal representedby the speech frame; means for detecting an extended periodic portionwithin the received speech signal using the periodicity measures from aplurality of successive speech frames; and means for controlling saiddevice in dependence upon a detection made by said detecting means.